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ABSTRACT 

The  Nash  equilibrium  concept  of  non-zero  sum 
games  is  one  possible  option  available  to  military 
planners  seeking  strategies  to  control  large  numbers  of 
autonomous  assets  operating  in  an  adversarial 
environment.  To  implement  the  Nash  strategies  inherently 
necessitates  making  assumptions  on  possible  adversarial 
actions.  However,  the  Nash  concept  suffers  from  one 
major  difficulty  which  limits  its  potential  usefulness.  A 
Nash  equilibrium  may  not  always  exist  in  pure  strategies. 
In  this  paper  we  introduce  the  concept  of  Near-Nash 
strategies  as  a  mechanism  to  overcome  this  difficulty.  We 
then  illustrate  this  concept  by  deriving  the  Near-Nash 
strategies  for  a  military  game  where  a  unique  Nash  is  not 
guaranteed  to  exist.  We  use  these  strategies  as  the  basis 
for  an  intelligent  battle  plan  for  heterogeneous  teams  of 
autonomous  combat  air  vehicles  in  the  Multi-Team 
Dynamic  Weapon  Target  Assignment  model. 

Keywords:  Non-zero  Sum  Games,  Nash  Strategies, 
Near-Nash  Strategies,  Autonomous  Combat  Vehicles, 
Target  Assignment  Problem. 

1.  INTRODUCTION 

As  autonomous  systems  mature  from  theoretical 
capabilities  into  combat  ready  reality  military  strategists 
have  become  increasingly  interested  in  finding  efficient 
command  and  control  solutions  which  are  capable  of 
intelligently  aiding  battlefield  commanders  responsible 
for  large  numbers  of  autonomous  assets  (Gerkey  and 
Mararik,  2004;  Diaz  et.  al.,  2003,  2006;  Steinberg,  2006; 
Kumar  et.  al.,  2006;  Chandler,  2004).  Lacking  the  vital 
improvisational  abilities  of  their  human  counterparts, 
these  autonomous  assets  require  more  in  depth  battle 
plans  and  access  to  robust  mission  re-planning.  Even  as 
the  military  migrates  from  conventional  forces  into 


smaller,  modular,  and  consequently  more  manageable 
teams  of  assets,  the  additional  planning  needs  of  an 
autonomous  asset  can  greatly  encumber  a  commander. 
Unchecked,  this  increased  workload  could  overwhelm  the 
capabilities  of  a  battlefield  commander,  greatly  reducing 
the  effectiveness  of  the  asset.  Automated  battle  command 
aids  which  use  game  theoretic  strategies  are  one  option 
which  shows  considerable  promise  (Galati  and  Simaan, 
2007;  Cruz  et.  al.,  2001,  2004;  Ganapathy  and  Passino, 
2003;  Liu  et.  al.  2003 a, b).  Because  possible  adversarial 
actions  are  inherently  considered  in  a  game  theoretic 
analysis,  these  planners  are  able  to  adapt  and  react  to 
potential  enemy  actions  in  an  ever-changing  battle  space. 
Nash  strategies  (Nash  1950;  Basar  and  Olsder,  1995) 
which  represent  an  equilibrium  in  which  neither  side 
benefits  from  unilaterally  deviating  from  a  given  strategy 
pair,  provide  the  predominant  methodology  for  these 
efforts.  Despite  their  potential  usefulness,  planners  which 
rely  on  the  Nash  equilibrium  suffer  from  one  major 
difficulty.  A  Nash  equilibrium  in  pure  strategies  is  not 
always  guaranteed  to  exist.  As  a  result,  it  is  important 
that  the  planner  be  designed  in  such  a  way  as  to  be  able  to 
handle  such  a  situation.  The  search  for  Nash  strategies  can 
be  very  time  consuming  especially  in  games  where  the 
decision  spaces  are  very  large.  These  planners  must 
therefore  have  an  alternative  search  strategy  that  takes 
care  of  the  possibility  of  nonexistence  of  Nash  Strategies. 

In  this  paper  we  introduce  the  concept  of  Near-Nash 
strategies  as  a  mechanism  to  overcome  this  difficulty.  We 
then  illustrate  this  concept  by  deriving  the  Near-Nash 
strategies  for  heterogeneous  teams  of  autonomous  combat 
air  vehicles  as  an  attempt  to  intelligently  aid  a  commander 
in  the  planning  of  a  military  air  operation.  We  explore  the 
Multi-Team  Dynamic  Weapon  Target  Assignment  model, 
and  the  Near-Nash  strategy  concept  to  compute  an 
intelligent  battle  plan  which  accounts  for  the  possible 
actions  of  the  enemy. 
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2.  NASH  STRATEGIES  IN  BATTLEFIELD 
SCENARIOS 

A  military  battlefield  is  an  extremely  demanding 
environment.  The  combination  of  uncertainty,  deceit,  and 
the  fluidity  of  a  scenario  can  cause  great  difficulty  even 
for  the  most  seasoned  military  commander.  History  has 
shown  that  the  strategies  that  commanders  employ  often 
have  a  larger  impact  on  the  outcome  of  a  battle  than  the 
composition  and  sizes  of  the  forces  at  their  disposal. 
Often,  a  brilliant  strategy  has  seized  victory  from 
seemingly  impossible  odds.  On  the  other  hand,  errors  in 
judgment  have  also  resulted  in  unexpected  disaster. 

History  has  also  shown  that  scenarios  in  which  a 
smaller  force  defeated  a  larger  force  are  often  the  result  of 
asymmetrical  situational  awareness  (Smith  2007).  In 
many  cases,  the  winning  commander  was  found  to  have 
an  accurate  picture  of  the  force  layout  and  strategy  of 
his/her  adversary  while  the  losing  commander  either  did 
not  account  for,  made  many  incorrect  assumptions,  about 
the  adversary.  As  a  result,  the  losing  commander’s  actions 
were  often  not  effective  or  even  counter  productive  while 
the  winning  commander’s  strategies  worked  to  great 
effect;  thus  allowing  the  smaller  force  to  overcome  the 
larger  force. 

Autonomous  assets  are  in  a  sense  naive  entities, 
lacking  the  vital  improvisational  skills  inherent  to  their 
human  counterparts.  These  assets  are  expected  to  operate 
in  a  chaotic,  hostile,  and  ever  changing  battlefield;  the 
same  battlefield  that  has  proven  to  be  so  difficult  to  their 
human  counterparts.  Because  of  this  inherent  naivete,  it  is 
likely  that  any  human  commander  will  have  a  much  more 
accurate  and  complete  view  of  the  battlefield  than  an 
autonomous  asset.  As  a  result,  unmanned  assets  are  far 
more  vulnerable  to  misdirection  and  are  more  likely  to  be 
deceived.  There  is  a  significant  danger  of  an  intelligent 
adversary  confusing  an  autonomous  asset  in  such  a  way 
as  to  induce  it  to  perform  in  an  ineffective  or  even 
counterproductive  manner. 

To  achieve  maximum  effectiveness,  automated 
planners  must  find  robust  strategies  that  take  this  inherent 
naivete  into  account.  While  it  is  highly  unlikely  that 
autonomous  planners  will  outperform  their  human 
counterparts  in  the  near  future,  these  planners  must 
attempt  to  mitigate  the  risk  of  an  autonomous  asset  being 
induced  to  act  in  a  counterproductive  manner. 

For  this  reason,  the  Nash  equilibrium  (Nash  1950)  of 
non-zero  sum  games  has  been  a  natural  approach  when 
automating  the  decision  making  process  in  automated 
battle  field  planners  (Galati  and  Simaan,  2007;  Cruz  et. 
al.,  2001,  2004;  Ganapathy  and  Passino,  2003;  Liu  et.  al. 
2003).  It  is  defined  for  scenarios  involving  multiple 
decision  makers,  each  having  their  own  decision  space 
and  objective  function  which  generates  a  measure  of  the 
attractiveness  of  each  possible  combination  of  decisions 
for  each  decision  maker.  Furthermore,  each  decision 


maker  is  assumed  to  have  either  absolute  knowledge  or  an 
estimate  of  the  objective  function  of  the  other  decision 
maker.  Under  these  conditions  a  pair  of  strategies  for  two 
decision  makers  engaged  in  a  game  is  a  Nash  equilibrium 
if  no  decision  maker  has  an  incentive  to  unilaterally  alter 
its  given  strategy.  In  other  words,  each  decision  makers’ 
strategy  is  optimal  for  the  assumed  strategies  of  the  other 
decision  maker. 

Nash  strategies  have  an  inherent  robustness  which 
makes  them  attractive  to  autonomous  planners.  A  Nash 
strategy  is  not  optimal  in  the  global  sense;  a  property 
which  often  precludes  it  from  achieving  the  best  possible 
outcome.  It  is  optimal  in  the  sense  that  it  represents  the 
best  possible  strategy  against  an  intelligent  opponent  who 
is  acting  in  a  similar  manner.  While  it  is  true  that  there  is 
no  guarantee  that  an  adversary  will  act  in  an  intelligent 
manner,  it  is  possible  to  say  (with  a  sufficiently  accurate 
scenario  model),  that  an  adversary  who  acts  in  a  manner 
other  than  Nash  will  produce  for  itself  a  worse  outcome 
than  if  it  had  acted  in  accordance  with  a  Nash  strategy. 

3.  DIFFICULTIES  WITH  THE  NASH 
STRATEGIES 

The  Nash  equilibrium  has  many  attractive  properties 
which  would  seem  to  promote  its  widespread 
implementation.  However,  the  traditional  implementation 
of  the  Nash  equilibrium  has  two  significant  drawbacks 
which  have  limited  its  use  in  many  practical  scenarios:  (1) 
potential  non-existence  and  (2)  potential  non-uniqueness. 
As  an  illustrate  example  consider  the  following  non-zero 
sum  game  between  Players  A  and  B  employing  strategies 
uA  e  (1,2, 3, 4}  andi/g  e  {1,2, 3, 4}  as  shown  in  Table  1. 
Player  A  wants  to  maximize  JA(uA,uB )  and  Player  B 
wants  to  maximize  JB  (uA ,  uB ) .  In  Table  1  the  *  denotes 

the  optimal  choice  for  a  player  given  the  particular  choice 
of  strategy  by  the  other  player.  For  example,  if  Player  B 
chooses  uB-  1 ,  then  Player  A’s  optimal  choice  is  uB  =  1 

as  denoted  by  the  *  on  J*A  (1, 1)  =  5 .  Clearly,  we  see  that 
the  strategy  pair  {uA,uB}  =  {2,2}  is  a  Nash  equilibrium  as 
neither  player  may  benefit  from  unilaterally  deviating 
from  its  strategy  in  this  pair. 

We  should  note  that  while  the  construction  of  this 
matrix  is  somewhat  simple  for  many  applications,  the 
relatively  unstructured  nature  of  military  conflict 
combined  with  the  possibly  large  number  of  available 
assets  creates  a  vast  and  complex  decision  space,  even  for 
relatively  small  scenarios.  Determining  the  Nash 
equilibrium  in  such  a  space  within  the  time  frame  of  the 
evolution  of  the  battlefield  becomes  a  significantly 
challenging  problem  (Galati,  Simaan  and  Liu,  2003). 

A  potential  issue  associated  with  the  Nash 
equilibrium  in  planning  applications  for  autonomous 
combat  vehicles  is  that  there  is  no  general  guarantee  that  a 


single  Nash  Equilibrium  in  pure  strategies  always  exists 
for  a  given  scenario.  Examining  the  matrix  shown  in 
Table  1,  we  can  easily  construct  examples  in  which  the 
optimal  reaction  sets  do  not  intersect  as  well  as  examples 
in  which  they  intersect  at  more  than  one  point. 


Table  1  -  Sample  Game  Matrix  with  a  Single  Nash 


UB=\ 

ub=2 

uB  =3 

ub=4 

UA=  1 

J*  (l,l)  =  5 
Jfi(l,l)  =  15 

•74(1,2)  =  6 
•7*  0,2)  =  4 

74(1,3)  =  14 
7g(l,3)  =  l 

J a  (1, 4)  =  0 
7g  (l, 4)  =  20 

uA=  2 

^(2,0  =  1 
7*  (2,0  =  7 

JA  (2,2)  =  10 
J*B  (2,2)  =  8 

JA  (2,3)  =  4 
7fi  (2, 3)  =  4 

JA  (2,4)  =  6 

7g  (2,4)  =  2 

R 

tu 

II 

LO 

•74(3,0  =  4 
7g  (3,0  -  6 

JA  (3, 2)  =  2 
j'b  (3,2)  =  7 

JA  (3,3)  =  15 

7g  (3,3)  =  5 

J*A  (3,4)  =  8 
7g  (3, 4)  =  3 

^T 

II 

3 

.7,t(4,0  =  3 
J*  (4,0  =  10 

•7,4  (4,2)  =  9 
■7*  (4, 2)  =  2 

j\  (4,3)  =  16 

7g  (4,  3)  =  5 

7,4  (4,4)  =  7 
7g  (4,4)  =  5 

For  example,  if  the  value  of  /5(2,4)in  Table  1  is 
altered  from  2  to  10,  the  game  will  no  longer  have  a  Nash 
equilibrium.  On  the  other  hand  if  JB  (3, 4)  is  altered  from 

3  to  8,  the  game  will  have  two  Nash  equilibria.  This 
presents  a  problem:  a  Nash  strategy  is  only  robust  because 
it  is  the  optimal  response  to  a  given  strategy.  This 
robustness  property  disappears  if  there  is  no  incentive  for 
ones  adversary  to  select  that  strategy,  or  if  there  is  an 
incentive  to  select  another.  Game  theory  makes  no 
prediction  as  to  the  outcome  if  two  players  choose 
strategies  from  different  Nash  equilibrium  points. 

In  non-zero  sum  matrix  games  with  random  entries  it 
has  been  shown  that  the  probability  of  existence  of 

exactly  K  Nash  equilibria  is  /?Nash  (x*)  =  e~l  /  x*  !  as  the  size 

of  the  game  becomes  infinitely  large  (Stanford  1995). 
Thus  in  random  games,  the  probability  of  existence  of 
more  than  one  Nash  equilibrium  is  26%  and  the 
probability  of  the  game  having  no  Nash  equilibrium  is 
37%.  While  we  recognize  that  military  engagements  can 
hardly  be  modeled  as  random  games,  we  can  infer  that 
there  is  a  considerable  risk  that  both  non-unique  and  non¬ 
existent  Nash  equilibria  may  occur  even  in  situations 
where  only  the  adversary’s  entries  in  the  matrix  are 
random  (Peterson  and  Simaan,  2008). 


4.  NEAR-NASH  STRATEGIES 

When  a  Nash  equilibrium  does  not  exist,  one  is 
tempted  to  look  for  alternative  strategies  that  may  have 
similar  properties.  This  is  problematic  because  game 
theoretic  models  do  not  share  the  same  principles  as 
standard  optimization  problems.  Unlike  classical 
optimization  problems  in  which  the  objective  function  are 


generally  quadratic  (concave)  around  the  optimum, 
implying  that  strategies  located  near  the  optimal  point  can 
be  assumed  to  be  near  optimal,  the  Nash  strategies  are 
defined  only  as  an  equilibrium  point,  and  do  not 
necessarily  possess  a  concavity  property  for  nearby 
strategies.  This  means  that  while  it  is  often  acceptable  to 
use  strategies  that  are  near  the  optimal  strategy  in 
classical  optimization  problems,  it  is  difficult  to  predict 
the  outcome  of  strategies  that  are  near  a  Nash  equilibrium. 

There  has  been  some  work  in  dealing  with  games 
with  no  Nash  equilibrium.  The  most  significant 
advancement  has  been  the  Epsilon-equilibrium  (Everett 
1957).  Most  commonly  applied  to  stochastic  games,  an 
Epsilon-equilibrium  is  defined  by  a  constant  £.  A  strategy 
pair  is  said  to  be  an  Epsilon-equilibrium  for  a  given  £  if 
no  player  can  improve  its  objective  function  by  more  than 
£  by  unilaterally  deviating  from  the  given  strategy.  While 
this  is  a  useful  strategy  concept,  it  is  not  possible  to 
guarantee  a  priori  that  an  Epsilon-equilibrium  exists  for  a 
given  £.  Selecting  an  £  that  is  too  high  may  result  in  a  less 
intelligent  strategy  choice  than  is  otherwise  available.  On 
the  other  hand,  selecting  an  £  that  is  too  low  may  result  in 
cases  where  no  satisfying  strategies  may  be  found. 

To  alleviate  this  problem,  we  propose  a  new  concept 
which  we  refer  to  as  the  Near-Nash  equilibrium  which 
expands  upon  the  Epsilon-equilibrium  similar  to  the  way 
which  optimization  problems  benefit  from  the  concept 
“near-optimal”  solutions.  Essentially,  we  reformulate  the 
Nash  criteria  as  an  optimization  problem  which  seeks  to 
minimize  the  squared  sum  of  the  losses  that  each  decision 
maker  may  obtain  by  unilaterally  deviating  from  a  given 
pair  of  strategies. 

To  mathematically  define  this  concept  of  a  Near- 
Nash  equilibrium  consider  a  two  player  game  between 
two  decision  makers,  Players  A  and  B.  Assume  that 
Player  A’s  optimal  response  to  given  strategy  uB  of 

Player  B  is  uA(uB)  ,  which  is  determined  as  follows: 

Ja{ua(ub)>ub)  =  max  Ja(UA’Ub)  (la) 

"a^a 

Similarly,  Player  B’s  optimal  response  u*B(uA)to  a 
particular  strategy  uA  of  Player  A,  can  be  derived  from: 

Jb(ua  ’U'b(Ua))=  max  Jb(UA’Ub)  (lb) 

uBGUB 

Thus,  the  amount  player  A  can  lose  by  unilaterally 
altering  its  strategy  from  the  optimal  response  to  a  given 
strategy  uB  of  player  B  is: 

Aa(ua,ub)  =  Ja(ua(ub),ub)-Ja(Ua,ub)  (2a) 

Similarly,  Player  B’s  loss  by  unilaterally  altering  its 
strategy  from  the  optimal  response  to  a  given  strategy 
uA  of  player  A  is: 

A b  •>  UB )  =  J B  (UA  J  UB  (UA  ))  —  ^ B  A  9  UB  )  (^b) 


We  note  that  Aa(ua,ub)  and  AB(uA,uB)are  clearly  non¬ 
negative  quantities.  We  also  note  that  a  Nash  equilibrium 
pair  {uA  ,  uB  }  (whether  unique  or  not)  may  be  necessarily 
and  sufficiently  defined  by  the  conditions 
Aa(una,unb)=Ab(una,unb)=0 ;  or 

Ja(ua(u»),u»)-Ja(una,u»)  =  0  (3a) 

JB  ( una  , uB  « ))  -  (una  ,<)  =  0  (3b) 

As  an  illustration,  for  the  game  defined  in  Table  1,  the 
value  of  ( uA , uB )  and  Ag  (w^ , uB )  are  shown  in  Table  2: 


Table  2:  Values  of  Aa(ua,ub)  and  A B(uA,uB) 
_ for  the  game  of  Table  1 _ . _ 


UB=  1 

<N 

II 

3 

UB  =3 

II 

cq 

3 

uA=  1 

A„(u)  =  o 
A*(l,l)  =  5 

A^(l,2)  =  4 
AB(l,2)  =  16 

A„(l,3)  =  2 
AB(l,3)  =  19 

A^(l,4)  =  8 
As(l,4)  =  0 

<N 

II 

3 

A„(2,l)  =  4 
Ab  (2,l)  =  l 

A^(2,2)  =  0 
As(2,2)  =  0 

A„  (2.3)  =  12 

Ab  (2, 3)  =  4 

A^  (2, 4)  =  2 
Ab  (2, 4)  =  6 

3 

II 

A^(3,l)  =  l 
As(3,l)  =  l 

A,,  (3,2)  =  8 
Ab(3,2)  =  0 

A^  (3,3)  =  1 
Ab  (3, 3)  =  2 

A^  (3, 4)  =  0 
Ab  (3, 4)  =  4 

II 

3 

A„(4,l)  =  2 
As  (4,l)  =  0 

A^(4,2)  =  l 
Ab(4,2)  =  8 

A^(4,3)  =  0 
Ab  (4, 3)  =  5 

A^  (4, 4)  =  1 
Ab  (4, 4)  =  5 

Similarly,  if  we  compute  these  values  for  the  modified 
game  where  the  value  of  Jg(2,4)in  Table  1  is  altered 
from  2  to  10,  the  corresponding  table  will  be  identical  to 
Table  2,  except  for  the  second  row  which  will  change  as 
illustrated  in  Table  3.  We  note  that  this  modifies  game  has 
no  Nash  equilibrium. 

Clearly,  the  fact  that  this  Table  has  no  {0,0}  entry 
confirms  that  the  game  has  no  Nash  solution. 
Alternatively,  assume  that  the  two  players  wish  to  find  a 
pair  of  strategies  which  guarantees  each  player  minimum 
losses  is  the  other  player  deviates  from  its  optimal 
reaction  to  its  strategy.  Since  the  losses  consist  of  a  pair 
of  numbers,  we  define  a  measure  of  the  cumulative  loss 
by  both  players  by  the  expression: 

J(uA ,  UB )  =  A2a  (ua,  ub  )  +  A2b  (ua  ,  uB )  (4a) 


or 


J  (  ma ?mb 


)  =  \Ja(u*a(ub\ub)-Ja(ua>ub )]2 
\_^B  (Ma  •>  ^B  (^A  ))  —  Jb  (^A  5  ^8  )J 


(4b) 


This  suggests  that  just  as  a  Nash  equilibrium  is 
characterized  by  a  pair  of  strategies  in  which  neither 
player  can  gain  by  unilaterally  deviating  from  it,  a 
strategy  pair  which  minimizes  both  players’  cumulative 


Table  3:  Values  of  A^  ( uA ,  uB  )  and  AB  ( uA ,  uB  )  for  the 


modified  game  of  Table  1 


UB=  1 

ub=2 

UB  =3 

uB=  4 

uA=\ 

A^(u)  =  o 
Afi(u)  =  5 

A„(i,2)  =  4 

Ab(1,2)  =  16 

A/j  (1, 3)  =  2 

Ab  (l,  3)  =  19 

A^(l,4)  =  8 
Ab  (1, 4)  =  0 

<N 

II 

3 

A^(2,l)  =  4 
a*  (2,1)  =  3 

A^  (2, 2)  =  0 

Ab  (2,  2)  =  4 

A/j  (2, 3)  =  12 
Ab  (2, 3)  =  6 

A^  (2,4)  =  2 
Ab  (2, 4)  =  0 

3 

II 

A^  (3,0  =  i 
As  (3,0  =  i 

A^  (3, 2)  =  8 
Ab  (3, 2)  =  0 

A„(3,3)  =  l 
As(3,3)  =  2 

A^(3,4)  =  0 
Ab  (3, 4)  =  4 

^r 

II 

3 

A^(4,0  =  2 
ab  (4,0  =  0 

A^  (4, 2)  =  1 
Ab  (4, 2)  =  8 

A^  (4, 3)  =  0 
AB  (4, 3)  =  5 

A^  (4, 4)  =  1 
Ab  (4, 4)  =  5 

losses  if  either  player  deviates  from  its  optimal  reaction  to 
the  other  player’s  strategy  could  be  defined  as  being  close 
to,  or  near,  a  Nash  equilibrium.  Thus  we  define  a  pair  of 
strategies  {u™ ,  u™}  as  a  Near-Nash  strategy  pair  if: 


Ja(u*a(ub)>ub) 


J{uf,uf)  = 


min 

{ua,ub)&U A^U B 


~J a(U  A  >Ub) 

J B  (uA  ,  UB  (uA )) 

~J B  (UA  >Ub) 


(5) 


We  note  that  this  definition  includes  also,  and  can  be 
used  to  compute,  the  Nash  equilibrium.  Clearly,  when  the 
above  minimum  is  equal  to  zero,  the  Near-Nash  strategies 
will  coincide  with  the  Nash  strategies.  To  illustrate  this, 
Table  4  shows  the  values  of  J(uA,uB)  for  all  pair  of 
strategies  in  Table  1.  The  Near  Nash  strategies  in  this  case 
are  {u™  ,u™}={ 2,2}  resulting  in  J( 2,2)  =  0  which  are 
the  same  as  the  Nash  strategies  for  this  game. 


Table  4:  Values  of  J(uA,uB)  for  all  strategies  in  Table  1 


uB  1 

uB  2 

UB  3 

UB=  4 

uA  =1 

.7(1,1)  =  25 

J(  1,2)  =  272 

J(  1,3)  =  365 

.7(1, 4)  =  64 

(N 

II 

3 

J(  2,1)  =  17 

J(  2,2)  =  0 

J(  2, 3)  =  160 

J(2, 4)  =  40 

u4=3 

J(  3,1)  =  2 

J(  3, 2)  =  64 

.7(3,3)  =  5 

.7(3,4)  =  16 

ua=4 

.7(4,1)  =4 

.7(4, 2)  =  65 

.7(4, 3)  =  25 

.7(4, 4)  =  26 

Now,  for  the  modified  game  in  which  the  value  of 
J5(2,4)in  Table  1  is  altered  from  2  to  10  and  for  which 

there  is  no  Nash  equilibrium,  the  values  of  J(uA,uB)  are 
tabulated  in  Table  5. 


Table  5:  Values  of  J(uA,uB)  for  all  strategies  in  Table  1  when 
JB  (2, 4)  is  altered  from  2  to  10 


Ub  ~  1 

uB  —2 

uB  —  3 

ub  —  4 

uA  =1 

.7(1,1)  =  25 

7(1, 2)  *=272 

7(1,3)  =  365 

7(1,4)  =  64 

ua=2 

.7(2,1)  =  25 

7(2, 2)  =  4 

7(2,3)  =  180 

7(2, 4)  =  4 

Ua=  3 

.7(3,1)  =  2 

7(3, 2)  =  64 

7(3,3)  =  5 

7(3,4)  =  16 

ua=4 

II 

7(4,2)  =  65 

7(4, 3)  =  25 

7(4, 4)  =  26 

Clearly  this  table  indicates  that  this  game  has  no  Nash 
equilibrium  and  the  Near  Nash  strategies  in  this  case  are 
{u™  ,uBN}={ 3,1}  corresponding  to  the  smallest  value  of 
J(uA,uB)=  2.  By  using  these  strategies,  each  player  is 
guaranteed  a  loss  of  no  more  than  1  if  the  other  player 
deviates  from  its  optimal  reaction.  This  appears  to  be  a 
very  appropriate  strategy  when  a  Nash  equilibrium  does 
not  exist. 

3.  INCORPERATING  THE  NEAR-NASH  IN  AN 
AUTONOMOUS  BATTLE  PLANNER 

Modern  military  conflict  combines  a  near  infinite 
number  of  strategies  and  command  decisions,  as  well  as 
considerable  heterogeneity  and  interdependency  in  unit 
attributes  and  pervasive  uncertainty  in  regards  to  units  on 
both  sides.  Consequently,  it  is  entirely  impractical  to 
implement  an  autonomous  battlefield  planner  that 
searches  for  a  Nash  equilibrium  in  the  classical  sense 
when  it  is  not  known  apriori  that  a  Nash  equilibrium 
exists.  An  alternative  approach  would  be  to  compute  the 
quantity  J(uA,uB )  for  all  pairs  of  strategies  and  search  for 
either  a  Nash  (J(uA,uB)=0)  or  a  Near-Nash  solution 
(minimum  of  J(uA,uB)). 

To  illustrate  the  effectiveness  of  the  Near-Nash 
equilibrium  and  to  show  that  it  has  properties  similar  to 
that  of  the  Nash  equilibrium  we  use  a  simplistic  model  of 
battlefield  dynamics  that  corresponds  to  the  Multi-Team 
Dynamic  Weapon  Target  Allocation  Problem  (Galati  and 
Simaan  2007).  In  this  model,  two  or  more  teams  of 
heterogeneous  fighting  units  must  collaborate  as  a  team  to 
destroy  enemy  units  while  preserving  friendly  units  over  a 
number  of  targeting  rounds  (Figure  1).  The  decision  space 
for  this  problem  is  limited  to  determining  a  target  for  each 
asset.  This  is  an  extension  of  the  Weapon  Target 
Assignment  (WTA)  problem  (Matlin  1970)  and  the 
Dynamic  Weapon  Target  Assignment  (DWTA)  problem 
(Murphy  1999)  in  that  each  unit  acts  like  both  a  weapon 
against  a  unit  in  the  other  team  and  a  target  for  a  unit  in 
the  other  team. 

To  present  a  mathematical  formulation  for  this 
structure,  let  the  two  teams  be  labeled  as  Blue  (B)  and 


Red  (R)  and  let  K  denote  the  total  number  of  time  steps 
representing  the  duration  of  the  battle.  Let  the  number  of 
non-homogeneous  fighting  units  at  step  k ,  where  k=0, 
1,  ...,K  in  each  team  be  NB(k )  and  NR(k)  respectively. 


Fig.  1.  MT-DWTA  simplification  of  the  battlefield 

Since  every  unit  in  each  team  is  to  be  assigned  a  unit 
on  the  other  team  as  a  target,  the  number  of  possible 
target  assignments  is  NB(k )  for  each  unit  on  the  Blue 
team  and  NR  ( k )  for  each  unit  on  the  Red  team.  These 

strategies  must  be  selected  for  battle  steps  k=0,l,...,K-l. 
If,  at  each  battle  step  k ,  a  team  chooses  a  strategy  based 
upon  an  objective  function,  we  assume  that  this  objective 
function  will  take  the  form  of  a  weighted  sum, 
maximizing  both  the  combined  worth  of  the  destroyed 
units  in  the  other  team  and  the  combined  worth  of  the 
remaining  units  in  that  team.  Let  these  objective  functions 
at  step  k  be  JB(uB(k),uR(k))  for  the  Blue  team 
and .4  {uB  (k),uR  (k))  for  the  Red  team,  where 

us  (k)  =  K,  (*)>«„  V  (01  and  UR  (0  =  [um  (k) 

,  ur2  (&) , ..,  um  (&)]'  are  Nb ( k )  and  Nr  ( k )  dimensional 

vectors  representing  the  Blue  and  Red  team’s  respective 
target  assignment  strategies  at  step  k.  The 

i‘h  entry  uBi  (k)  e  {0, 1, NR  ( k )}  in  uB(k)e  UB  ( k ) 

represents  the  Red  target  assigned  to  the  ith  Blue  unit.  A 
similar  representation  is  also  employed  for  the  Red  team1 *. 

In  the  MT-DWTA,  each  unit  may  be  valued 
differently  by  each  team.  Let  b f  denote  the  worth  of  the 

f  Blue  unit  to  the  Blue  team  and  let  bR  denote  the 
worth  of  that  unit  to  the  Red  team.  Likewise,  let  r f  and 

rR  denote  the  worth  of  the  jth  Red  unit  to  the  Blue  and 
Red  teams  respectively.  Assume  that  the  probability  of 
kill  of  the  f  Blue  unit  against  the  jth  Red  unit  at  the 


1  The  choice  ub  ( k )  =  0  or  ur  (k)  =  0  implies  that  no  target  has 
been  assigned  to  the  ith  unit  in  the  Blue  team  or  the  jth  unit  in 

the  Red  team.  The  strategy  u(k)  =  0  =  [0, 0, . ,  0]  is  used  to 

denote  that  no  unit  in  a  given  team  has  been  assigned  a  target. 


k th  battle  step  is  pB  (A;)  .  Similarly,  let  the  probability  of 

kill  of  the  jth  Red  unit  against  the  ith  Blue  unit 

be p*.  (A;)  .  Finally,  let  b  (k)  and  R.(k) denote  the 

probabilities  that  the  ith  Blue  unit  and  jth  Red  unit  are 

alive  at  the  start  of  the  kth  battle  step.  Using  these 
notations  we  can  express  the  probability  of  survival  of  the 

ith  Blue  unit  and  jth  Red  unit  as  follows: 

Nr(  k) 

B, (*)  =  B, (k  - 1) n  [l  -  p). 0 k)5{i - uRj  (k))Rj (k  - 1)]  (6a) 


NB{k) 

Rj  (*)  =  Rj  (k  - 1)  n  [i  ■ -  pIj  (k)  s(j  -  uBi  {k))  Bt  {k  -1)]  (6b) 

i= i 

Consequently,  the  objective  functions  for  the  Blue 
and  Red  teams  can  be  expressed  as: 

,  .  NB{k)  NR(k) 

JB(uB,uRk)=  Y^bfBXk)-  2^rfRj(k)  m 

i= 1  j= 1  V '  / 

NB{k)  NR(k ) 

JR(uB,uRk)=-Yjb*Bi(k)+  £r*Rj(k) 

i= 1  7=1 

where  the  term  S(p-q)  is  the  Kronecker  delta  defined 


(0  if  p  ^  q 

by  8(p  -q)  =  \  ,  and  is  used  to  indicate  that  unit  q 

(l  if  p  =  q 

of  the  Blue  team  has  been  assigned  to  target  unit  p  in  the 
Red  team 

Applying  the  near-Nash  equilibrium  defined  in  (4,  5) 
to  the  MT-DWTA  objective  functions  as  presented  in  (7), 
we  are  left  with  the  following  multi  stage  optimization: 

\\_Jb{.u'b(ur)’ur)-Jb(ub’ur)]  + 
min  <  >  (8) 

(ub,ur)g(ubxUr)  r  /  *  /  \\  (  \”|2  | 

[1 _JR  (  UB  >  UR  VUB  ))  ~  JR  VUB  ’  UR  )  J  J 

where 

( rNB(k)  NR(k )  ) 

JB  (“i  (“  r)  .  |  Z  ^  (0  +  Z  (*)  j 


f^(*)  7Vs(7r) 

i  Yub*Bi(.k)+  Z 

[  /=!  j= 1 


We  note  that  even  though  the  objective  functions  are 
evaluated  for  the  predicted  assets  remaining  at  the  end  of 
a  single  battle-step,  the  control  vectors  {uB,uR}  may 


extend  over  multiple  battle-steps.  For  simplicity,  in  this 
paper  we  will  only  consider  two  battle-steps,  k  and  AH-1, 
optimizing  for  the  k+  Ith  step  (in  other  words  one  step 
look-ahead).  Thus  each  control  vector  may  be  expressed: 

UB  e  U B  \Pb  UB  “I”  l)]  e  \u B  )j  U B  (k  + 1)] 

ur  e  kJR  =>  k  (k), uR  (k  + 1)]  e  [c/jj  (k),  UR  {k  + 1)] 

It  has  been  shown  that  each  of  the  sub-optimizations 
in  (8)  is  NP  Hard  (Murphy  1999).  Consequently,  we 
cannot  solve  (8)  exactly11.  Previously,  we  have  shown  that 


11  We  note  that  the  inability  to  find  a  closed  form  solution  to  (8) 
does  not  prevent  it  from  acting  as  a  test  case  for  Near-Nash 


ULTRA,  a  neighborhood  search  technique  which  attempts 
to  improve  a  given  strategy  by  modifying  the  target 
assignments  of  one  or  more  assets  (Galati  et.  al.  2003)  is 
capable  of  quickly  determining  target  assignments  that  are 
on  average  95%  optimal  for  scenarios  approaching  200 
individual  assets. 

To  find  a  Near-Nash  strategy  for  the  MT-DWTA,  we 
will  use  a  tit-for-tat  or  action/re-action  search.  We  first 
assume  a  two  step  strategy  by  the  Red  Team111, 

oy{ur  (k\uR  (A:  + 1)}° .  The  Blue  team  then  calculates 

[uB  (k) , uB  {k  + 1)}°  using  the  ULTRA  algorithm  (Galati  et. 
al.  2003)  to  find  the  Optimal  Response  to  Red’s  initial 
strategy.  Red  then  calculates [uR  (k),uR  (^  +  l)}1  as  the 

optimal  response  to[uB  (k),uB  (&  +  l)}° .  This  process 

iterates  until  on  of  three  possible  terminating  conditions 
are  reached: 

[uB(k),uB(k  +  \))r  ={uB(k),uB(k  +  \))t  ‘or 
{uR(k),uR(k  +  \)\  ={uR(k),uR(k  +  \)\  ' 

{uB(k),uB(k  +  l)}r  ={uB(k),uB(k  +  l)}T  9 or  (JO) 

[uR  (k),uR(k+ 1)}"  =  [uR  (k),uR(k  +  l)}r  ° 

where  6  >  2  and  r  >  maximum  number  of  iterations. 

Thus  the  algorithm  will  terminate  in  a  cycle  of 
either  1,  6 ,  or  r  iterations.  After  one  of  these  three 
terminating  conditions  has  been  reached,  we  select  the 
pair  of  strategies  within  this  cycle  which  are  closest  to  a 
Nash  Equilibrium.  Having  used  this  algorithm 
extensively,  we  can  make  several  observations  that  are 
not  analytically  provable,  but  are  useful  nonetheless: 

While  (10)  is  theoretically  possible,  we  find  that  this 
algorithm  typically  converges  with  r  <  10  .  This  is  because 
there  are  a  small  number  of  individual  optimal  responses 
for  the  entire  set  of  adversarial  strategies. 

We  have  found  that  0  <  3 ,  though  is  typically  confined 
to  1  or  2.  We  note  that  if  the  terminating  condition  in  (10) 
is  reached,  then  the  given  strategies  are  by  definition  a 
Nash  strategy  pair  as  each  strategy  is  an  optimal  response 
to  the  other. 


planners.  On  the  contrary,  this  is  representative  of  the  conditions 
most  battlefield  planners  will  have  to  contend  with.  By 
illustrating  that  near-Nash  strategies  perform  almost,  but  not 
quite,  as  well  as  pure  Nash  Equilibriums  in  a  manner  similar  to 
the  way  sub-optimal  approximations  are  almost,  but  not  quite 
optimal,  we  further  enhance  our  claim  that  near-Nash  based 
strategies  have  many  of  the  properties  of  Nash  equilibrium 
without  the  rigorous  requirements. 

111  Though  either  is  acceptable,  for  simplicity  and  without  loss  of 
generality  we  will  assume  that  the  Red  Team  provides  the  initial 
strategy. 


4.  EVALUATING  THE  EFFECTIVENESS  OF 
NEAR  NASH  STRATEGIES 

To  verify  the  effectiveness  of  the  Near  Nash 
equilibrium  as  a  solution  concept  we  will  examine 
interactions  of  the  following  three  strategy  types:  Near- 
Nash  against  Near-Nash,  Near-Nash  against  Optimal 
Response  (denoted  by  a  *  superscript),  and  Near-Nash 
against  Team  Optimal  Response  (denoted  by  an  o 
superscript).  We  note  that  the  target  assignment  vectors 

are  to  be  selected  from  discrete  spaces  labeled  UxB(k) 

and  UXR  ( k ) ,  each  containing  S*B  ( k )  and  SXR  (k)  possible 

target  assignments  strategies  available  to  each  team 
respectively  at  step  k. 

A  strategy  jw*  ( k),uB  (&  +  l)}  e  U*b  (&)  is  defined  as 
an  Optimal  Response  strategy  for  the  Blue  team  given  an 
announced  strategy  by  the  Red  tearm/^  (&) ,  over  a  look 
ahead  horizon  d=l  if  at  step  k  it  satisfies  the  inequality: 

!)}’{“*  M>«4*+1)})-  (1  la) 

JB  ({ UB  (' k),uB  (fc+l)},{«*  (k),uR  (*  +  l)}j 
V  [uB(k),uB (k+l)}sU; (k) 

Likewise  a  strategy  ji/  (k),uR  (£  +  l)j  s  u'r  (&)  is 

defined  as  an  Optimal  Response  strategy  for  the  Red 
team  given  an  announced  strategy  by  the  Blue  team 

K  0)  ,  over  a  look  ahead  horizon  d=l  if  at  step  k  if  it 
satisfies  the  inequality: 

J R[\^i(k),uB{k+\)  },{i4(£),i4(£  +  t)})>  (lib) 

JR  ({«2  (k)  ’UB  [k + !)}  ,{uR(k),uR(k  + 1)}  j 

V  {4  (*)>«*  (*  +  l)}e U‘R(k) 

We  also  note  that  the  Optimal  Response  strategy  is 
calculated  with  full  knowledge  of  the  adversaries  intended 
strategy.  We  also  note  that  the  Optimal  Response  strategy 
for  a  team  depends  only  on  the  strategy  announced  by  its 
opponent  for  the  current  battle  step.  Its  objective  function 
can  be  decoupled  during  the  second  battle  step. 

A  strategy  u°B  {k)  e  U°B  (&)  is  called  a  Team 
Optimal  strategy  for  the  Blue  team  at  step  k  if  it  is 
selected  such  that  JB(u°B(k\  0, k)>  JA(uB(k) ,  0,  k)  for 

all  uB(k )  e  U°B(k )  .  Similarly,  a  strategy  u°R  GU°R(k )  is 
called  a  Team  Optimal  strategy  for  the  Red  team  if  it  is 
selected  such  that  JR  (0,  u°R  (&) ,  k)  >  J A  (0,  uR  ( k) ,  k)  for 

all  uR  (k)  g  U°R  (k) . 

We  note  that  the  Team  Optimal  strategy  is  one  that 
completely  ignores  the  adversarial  nature  of  the  other 
team  and  considers  it  only  as  a  set  of  target  units.  It 
represents  the  standard  non-game  based  solution  to  the 
target  assignment  problem  (Murphy  1999). 


To  demonstrate  the  effectiveness  of  near  Nash 
strategies  on  a  MT-DWTA  problem,  we  conducted  a 
series  of  Monte  Carlo  simulations  on  a  scenario  where  a 
team  of  10  Red  units  were  engaged  with  a  team  of  10 
Blue  units.  We  assumed  that  the  two  10  10  matrices  of 
probabilities  of  kill  of  Blue  against  Red  and  Red  against 
Blue  have  entries  that  are  random  numbers  uniformly 
distributed  in  the  interval  [0,  1].  The  objective  functions 
for  each  team  were  structured  as  described  in  equations 

(7)  and  the  unit  worth  values  b* ,b* ,r* ,r*  were  also 

randomly  and  independently  selected  in  the  range  [0,  1] 
with  uniform  probability  distributions^.  To  obtain  valid 
aggregate  results,  we  performed  30,000  runs  for  each  of 
the  combination  of  strategies  and  averaged  the  results. 
These  runs  differed  in  that  all  random  numbers  were 
selected  for  each  run  using  a  different  seed.  The  results  of 
this  simulation  are  tabulated  in  6. 


Table  6  -  Comparing  Near-Nash  vs.  other  strategies 


Combination  of  Strategies  Employed 
near-Nash  (Blue)  vs  near-Nash  (Blue)  vs  near-Nash  (Blue)  vs 

near-Nash  (Red)  Optimal  Response  (Red)  Team  Optimal  (Red) 

%  of  initial  g|y0 

forces 

remaining  R6CI 

9%  8%  12% 

9%  10%  7% 

Examining  Table  6,  we  can  see  that  the  case  of  Near- 
Nash  against  Near-Nash  yields  results  very  similar  to  the 
results  of  Near-Nash  versus  Optimal  Response.  In 
contrast,  we  see  that  there  is  a  more  substantial  difference 
between  the  results  of  Near-Nash  versus  Near-Nash  and 
those  obtained  from  Near-Nash  versus  Team  Optimal 
strategies.  In  a  true  Nash  equilibrium,  because  no  team 
has  an  incentive  to  alter  their  strategy,  the  Optimal 
Response  strategy  will  yield  identical  results  to  that  of  the 
Nash  Equilibrium.  Thus  we  can  conclude  from  the 
simulations  that  the  Near-Nash  approach  yields  results 
very  close  to  those  expected  from  a  true  Nash  Equilibrium 
even  though  such  an  equilibrium  does  not  always  exist  or 
cannot  be  found. 

5.  CONCLUSION 

Combat  systems  of  the  future  will  begin  to  deploy 
unmanned  assets  in  ever  larger  numbers  as  the  capabilities 
of  unmanned  systems  evolve.  This  migration  from 
manned  to  unmanned  will  result  in  many  changes  in  the 
militaries  force  structure.  One  major  change  will  be  the 
relationship  between  commanders  and  unmanned  assets. 
Whereas  today’s  military  often  assigns  multiple 
commanders  to  a  single  unmanned  asset,  the  force 


lv  We  acknowledge  that  military  assets  do  not  have  random 
probability  of  kill  vectors.  Various  assets  tend  to  have  distinct 
strengths  and  vulnerabilities.  However,  we  find  that  Nash 
strategies  fare  much  better  in  structured  scenarios  (Galati  and 
Simaan  2007).  Therefore,  using  random  probability  of  kill 
matrices  represents  a  worst  case  scenario  for  near-Nash 
Strategies. 


structure  of  the  future  will  require  a  single  commander  to 
intelligently  control  multiple  autonomous  assets. 

One  commander  is  not  capable  of  performing  all  of 
the  functions  necessary  to  control  multiple  autonomous 
assets  in  the  same  manner  as  today’s  unmanned  systems. 
Most  likely,  the  commander  of  the  future  will  rely  on 
battle  planning  software  to  augment  their  abilities  and  to 
automate  many  planning  functions.  However,  there  is  a 
large  technology  gap  between  what  is  available  now  and 
what  is  needed  in  the  future.  A  great  deal  of  work  must  be 
done  to  bridge  this  gap. 

Planning  aids  based  upon  game  theoretic  concepts, 
the  Nash  equilibrium  in  particular,  are  one  promising 
avenue  of  research.  However,  because  of  the  uncertainty 
as  to  the  existence  of  the  equilibrium  point,  and  the 
increased  domain  knowledge  required  to  conduct  such  an 
analysis,  there  is  often  a  desire  to  use  simple,  naive 
strategy  options  that  do  not  reason  about  possible 
adversarial  actions. 

In  this  paper  we  sought  to  answer  these  common 
criticisms  and  to  justify  future  research  into  game 
theoretic  planning  for  unmanned  assets.  We  introduced 
the  concept  of  the  Near-Nash  strategies  to  overcome  the 
possibility  that  a  unique  Nash  may  not  exist.  We  applied 
the  Near  Nash  concept  to  the  MT-DWTA,  a 
representative  example  of  a  battle  space,  using  the 
ULTRA  algorithm  and  a  tit-for-tat  action/reaction  type 
iterative  search.  We  then  compared  the  performance  of  a 
Near-Nash  based  strategy  to  the  Optimal  Response,  Team 
optimal,  and  Near-Nash  strategies.  Using  a  series  of 
Monte  Carlo  simulations,  we  demonstrated  that  the  Near- 
Nash  strategies  are  justifiable  in  that  they  yield  results 
comparable  to  what  a  genuine  Nash  equilibrium  would 
yield. 
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